A PAPI Implementation for BlueGene
نویسنده
چکیده
The IBM BlueGene/L (BG/L) super-computer holds 3 of the top 10 rankings on the 26th TOP500 list of LINPACK performance. The system is novel in its design in many aspects when compared to other more traditional high-performance computing systems. When developing system libraries as well as when tuning application code for BG/L it is essential to be able to measure the impact of code modifications and algorithmic choices. PAPI is a platform neutral user level library to accomodate programmers’ need to access on-chip performance counters. This paper describes the implementation of the low-level kernel interface for hardware performance counter access on BG/L and the accompanying PAPI implementation.
منابع مشابه
Implementing MPI on the BlueGene/L Supercomputer
The BlueGene/L supercomputer will consist of 65,536 dual-processor compute nodes interconnected by two high-speed networks: a three-dimensional torus network and a tree topology network. Each compute node can only address its own local memory, making message passing the natural programming model for BlueGene/L. In this paper we present our implementation of MPI for BlueGene/L. In particular, we...
متن کاملFourier Transforms for the BlueGene/L Communication Network
A computational kernel of particular importance for many scientific applications is the Fast Fourier Transform (FFT) of multi-dimensional data. A fundamental challenge is the design and implementation of such parallel numerical algorithms to utilise efficiently thousands of nodes. The BlueGene/L is a massively parallel high performance computer organised as a three-dimensional torus of compute ...
متن کاملObtaining Hardware Performance Metrics for the BlueGene/L Supercomputer
Hardware performance monitoring is the basis of modern performance analysis tools for application optimization. We are interested in providing such performance analysis tools for the new BlueGene/L supercomputer as early as possible, so that applications can be tuned for that machine. We are faced with two challenges in achieving that goal. First, the machine is still going through its final de...
متن کاملEnabling Dual-Core Mode in BlueGene/L: Challenges and Solutions
BlueGene/L is a massively parallel computer system with 65,536 dual-processor compute nodes. The peak performance of BlueGene/L is in excess of 360 TFLOP/s if both processor cores in a node are used for computation. The main challenge of deploying this dual-core mode of operation is that the L1 caches in each core are not hardware coherent. This forces a software-based approach to cache coheren...
متن کاملDesign and Analysis of the BlueGene/L Torus Interconnection Network
BlueGene/L (BG/L) is a 64K (65,536) node scientific and engineering supercomputer that IBM is developing with partial funding from the United States Department of Energy. This paper describes one of the primary BG/L interconnection networks, a three dimensional torus. We describe a parallel performance simulator that was used extensively to help architect and design the torus network and presen...
متن کامل